Coding for DS and DM
R coding module

Lecture 5

Andrea Cappozzo
andrea.cappozzo@unimi.it
AndreaCappozzo
andreacappozzo.rbind.io

Graphical representations

Graphical representations

  • The previous figure is taken from data-to-viz.com

  • Based on the type of variable, different graphical representations are available

  • Conveying information through plot is an art (data visualization)

  • Impossible to see everything also in an entire course devoted to data visualization

  • A picture is worth a thousand words

Meme of the day

Simple Plots with plot()

  • The simplest (and probably most important) R function for creating a plot is plot().

  • Its behavior depends on the “class” of the objects passed as input. Let’s create a basic scatter plot.

Our first plot, a scatter plot:

x1 <- 1:10
y1 <- 10:1
plot(x1, y1)

Modifying the Appearance of a Plot

  • From ?plot.default, we see that the function accepts various arguments and graphical parameters to modify and adjust the appearance of a plot.

Modifying the Appearance of a Plot

##  Customize the scatter plot
plot(x1, y1, col = "red", cex = 2, pch = 20, xlab = "Hi!", ylab = "", main = "My second plot!")

Adding Points to an Existing Plot

  • We can add a “new” set of points to an existing plot using the points() function. Here’s how.

Adding Points to an Existing Plot

##  Add points to the existing plot
x2 <- runif(100, min = 0, max = 10)
y2 <- runif(100, min = 0, max = 10)
plot(x1, y1, col = "red", cex = 2, pch = 20)
points(x2, y2, col = "blue", cex = 2, pch = 20)

Multi-Plot Layouts

  • To represent two sets of points in separate plots within the same panel, we can modify the mfrow parameter.

Multi-Plot Layouts

##  Create a multi-plot layout
par(mfrow = c(1, 2)) ##  1 row and 2 columns filled by row
plot(x1, y1, type = "b", col = "red", cex = 2, pch = 20)
plot(x2, y2, col = "blue", cex = 2, pch = 20)

Reset Plot Settings

  • To reset the old graphical parameters and close the graphical device.
##  Close the plot device and reset parameters
dev.off()
null device 
          1 
par(mfrow = c(1, 1))

Adding Lines and Segments to a Plot

We can use the segments() and lines() functions to add lines and segments to an existing plot.

  • lwd modifies line width
  • col modifies line color
  • lty modifies line type

Adding Lines and Segments to a Plot

##  Adding segments to a plot
plot(x2, y2, xlim = c(-0.25, 10.25), ylim = c(-0.25, 10.25), pch = 20)
segments(
  x0 = c(0,   0, 10, 10, 0,   0), y0 = c(0,  10, 10,  0, 0,  10),
  x1 = c(0,  10, 10,  0, 10, 10), y1 = c(10, 10,  0,  0, 10,  0),
  lwd = 2, col = 2,  lty = 2
)

Customizing Plot Markers

The plot() function is vectorized with respect to its parameters. This means we can pass vectors to certain arguments like color (col), point type (pch), and size (cex).

Customizing Plot Markers

plot(x1, y1, col = 1:10, pch = 1:10, cex = 1:10 / 2, lwd = 3, xlab = "", ylab = "", xlim = c(0, 11), ylim = c(0, 11))

Plotting Functions: lines()

The lines() function can be used to add lines to an existing plot. It behaves similarly to plot(), allowing you to easily overlay multiple functions.

Plotting Functions: lines()

x3 <- seq(-5, 5, by = 0.1)
plot(x3, sin(x3), type = "l", ylab = "", xlab = "x", lwd = 2, col = "red")
lines(x3, cos(x3), lwd = 2, col = "blue")

Plotting a Histogram: hist()

##  Simple histogram
set.seed(1)
x <- rnorm(n = 500)
hist(x)

Modifying the Number of Breakpoints

We can control the number of breakpoints using the breaks argument.

##  Adjusting the number of breaks
hist(x, breaks = 30)

Histogram with Density Overlay

The hist() function can plot relative frequencies, and the density() function can be used to estimate the probability density of the data.

##  Overlaying density on a histogram
hist(x, breaks = 30, freq = FALSE)
lines(density(x), lwd = 2, col = grey(0.2), lty = 2)

Kernel Density Estimation: density()

The density() function provides a non-parametric estimate of the probability density function.

##  Plotting density estimates with different bandwidths
plot(density(x, bw = 1))  ##  reasonable smoothing

plot(density(x, bw = 0.1)) ##  less smoothing

Drawing Functions with curve()

The curve() function is used to plot mathematical expressions or functions in R. Here’s an example of plotting a cubic function.

##  Plot a cubic function using curve()
curve(expr = x ^ 3 - x ^ 2 - 3 * x, from = -2, to = 2.5)

Plotting Probability Density Functions

The curve() function can also be used to plot predefined functions like dnorm(), which represents the probability density function of a normal distribution.

##  Plot the standard normal distribution
curve(dnorm, from = -3, to = 3)

Customizing the curve() Plot

##  Customized curve plot with labels and styling
curve(expr = x ^ 3 - x ^ 2 - 3 * x,
  from = -2, to = 2.5, lwd = 2, col = 2,
  main = bquote(f(x) == x^3 - x^2 - 3 * x), #  see?plotmath 
  xlab = "", ylab = "", cex.axis = 1.25, cex.main = 2, lty = 2
)

Empirical Cumulative Distribution Function (ECDF)

The ecdf() function calculates the ECDF for a numeric vector. ECDF represents the percentage of values in x that are less than or equal to t.

Plotting the ECDF of a small sample

x <- c(1, 2, 3)
plot(ecdf(x))

Comparing ECDF and Theoretical CDF

##  Generating random normal data and comparing ECDF with CDF
set.seed(1)
x <- rnorm(100)
plot(ecdf(x), cex = 0.1) ##  ECDF plot
curve(pnorm, add = TRUE, col = 2, lwd = 2) ##  Overlay theoretical CDF

Homework: have fun with plots!

  • Come up with nice visualizations for the Spotify dataset!
  • Available here
  • Particularly, try to use commands we have had not time to cover:
    • barplot()
    • boxplot()
    • More?